class: title-slide # ER014 - Data Science & Strategy for Business ## PVA3 ### Teil 2: Fallbeispiel <br> <br> <br> <br> <br> <br> <br> ### FS 2025 <br> ### Prof. Dr. Jörg Schoder .mycontacts[
@FFHS-EconomicResearch
@jfschoder ] --- layout: true <div class="my-footer"></div> <div style="position: absolute;left:400px;bottom:10px;font-size:9px">
Prof. Dr. Jörg Schoder</div> --- class: left .blockquote[Fallbeispiel Kundenbindung] ## Kundenbindung & Churn-Analyse: Daten aus dem Retail-Banking
📝 **Aufgaben:** 1. Importiert die Daten aus Moodle oder unserem [github-Repo](https://github.com/FFHS-EconomicResearch/ER014/blob/main/data/raw/Churn_Modelling.csv) 2. Identifiziert die Variablentypen. Welche Variable ist als abhängige Variable für eine logistische Regression zur Kundenbindung geeignet? ??? * 14 Variablen mit demografischen und kundenspezifischen Informationen * mögliches Ziel: Vorhersage der Variable Exited (0 = Kunde bleibt, 1 = Kunde kündigt) * Daten von [kaggle](https://www.kaggle.com/code/mervetorkan/churn-prediction) * RowNumber: corresponds to the record (row) number and has no effect on the output. * CustomerId: contains random values and has no effect on customer leaving the bank. * Surname: the surname of a customer has no impact on their decision to leave the bank. * CreditScore: can have an effect on customer churn, since a customer with a higher credit * score is less likely to leave the bank. * Geography: a customer’s location can affect their decision to leave the bank. * Gender: it’s interesting to explore whether gender plays a role in a customer leaving the bank. * Age: this is certainly relevant, since older customers are less likely to leave their bank than younger ones. * Tenure: refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank. * Balance: also a very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances. * NumOfProducts: refers to the number of products that a customer has purchased through the bank. * HasCrCard: denotes whether or not a customer has a credit card. This column is also relevant, since people with a credit card are less likely to leave the bank. * IsActiveMember: active customers are less likely to leave the bank. * EstimatedSalary: as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries. * Exited: whether or not the customer left the bank. (0=No,1=Yes) --- class: left .blockquote[Fallbeispiel Kundenbindung] ## Kundenbindung & Churn-Analyse: Erste Einblicke .panelset[ .panel[panel-name[Gesamt] ``` r # Zusammenfassung Zielvariable tbl_churn %>% count(Exited) %>% mutate(Prozent=n/sum(n)*100) ```
] .panel[.panel-name[Geschlecht] ``` r tbl_churn %>% group_by(Gender) %>% summarise(Anzahl = n(), Churn_Anzahl = sum(Exited), Churn_Rate = mean(Exited) * 100) ```
] .panel[.panel-name[Geschlecht & Land] ``` r tbl_churn %>% group_by(Gender, Geography) %>% summarise(Anzahl = n(),Churn_Anzahl = sum(Exited), Churn_Rate = mean(Exited) * 100) ```
] ] --- class: left .blockquote[Fallbeispiel Kundenbindung] ## Kundenbindung & Churn-Analyse: EDA 📝 **Aufgabe:** Führt den folgenden Code-Block aus und interpretiert die Ergebnisse. ``` r tbl_churn %>% group_by(Gender) %>% summarise(num_customers = n(), mean_crScore = round(mean(CreditScore),2), mean_age = round(mean(Age),2), mean_tenure = round(mean(Tenure),2), mean_balance = round(mean(Balance),2), mean_salary = round(mean(EstimatedSalary),2)) ```
--- class: left .blockquote[Fallbeispiel Kundenbindung] ## Kundenbindung & Churn-Analyse: EDA Soziodemografie 📝 **Aufgabe:** Erstellt die Diagramme 1-3 in den panels mit dem **ggplot2**-Paket und interpretiert diese. .panelset[ .panel[.panel-name[Diagramm 1] <img src="data:image/png;base64,#02_Fallbeispiel_Churn_files/figure-html/unnamed-chunk-11-1.png" width="45%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Diagramm 2] <img src="data:image/png;base64,#02_Fallbeispiel_Churn_files/figure-html/unnamed-chunk-12-1.png" width="45%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Diagramm 3] <img src="data:image/png;base64,#02_Fallbeispiel_Churn_files/figure-html/unnamed-chunk-13-1.png" width="45%" style="display: block; margin: auto;" /> ] ] ??? Diagramm 1 alternative <!-- --> <!-- --> --- class: left .blockquote[Fallbeispiel Kundenbindung] ## Kundenbindung & Churn-Analyse: EDA Streudiagramm-Matrix <img src="data:image/png;base64,#02_Fallbeispiel_Churn_files/figure-html/unnamed-chunk-16-1.png" width="60%" style="display: block; margin: auto;" /> ??? 📝 **Aufgabe:** Interpretiert die Kennzahl und das Diagramm ``` r # Zusammenfassung Zielvariable tbl_churn %>% count(Exited) %>% mutate(Prozent=n/sum(n)*100) ``` ``` ## # A tibble: 2 × 3 ## Exited n Prozent ## <dbl> <int> <dbl> ## 1 0 7963 79.6 ## 2 1 2037 20.4 ``` --- class: left .blockquote[Fallbeispiel Kundenbindung] ## Weitere Aufgaben 1. Teilt den Datensatz in Trainings- und Testdaten auf (70/30). 2. Trainiert ein logistisches Regressionsmodell auf den Trainingsdaten und erstellt Vorhersagen für die Testdaten. 3. Erstellt eine Confusion Matrix und berechnet die Accuracy, Precision, Recall und F1-Score. 4. Diskutiert, welche dieser Metriken in diesem geschäftlichen Kontext am wichtigsten sein könnte und warum. 5. Erstellt die ROC-Kurve für Ihr Modell, berechnet den AUC-Wert und interpretiert ihn. 6. Experimentiert mit verschiedenen Schwellenwerten und beobachtet, wie sich Precision und Recall ändern. --- class: inverse,center,middle # Schönen Feierabend! --- background-image: url("data:image/png;base64,#http://bit.ly/cs631-donkey") background-size: 80%